
[Dec 25, 2021] PracticeVCE Professional-Data-Engineer dumps & Google Cloud Certified sure practice dumps
Google Professional-Data-Engineer Actual Questions and Braindumps
Google Professional-Data-Engineer Exam Syllabus Topics:
| Topic | Details |
|---|---|
| Topic 1 |
|
| Topic 2 |
|
| Topic 3 |
|
| Topic 4 |
|
| Topic 5 |
|
| Topic 6 |
|
NEW QUESTION 50
Your company has a hybrid cloud initiative. You have a complex data pipeline that moves data between cloud provider services and leverages services from each of the cloud providers. Which cloud-native service should you use to orchestrate the entire pipeline?
- A. Cloud Dataproc
- B. Cloud Dataprep
- C. Cloud Composer
- D. Cloud Dataflow
Answer: A
NEW QUESTION 51
You are creating a model to predict housing prices. Due to budget constraints, you must run it on a single resource-constrained virtual machine. Which learning algorithm should you use?
- A. Feedforward neural network
- B. Linear regression
- C. Logistic classification
- D. Recurrent neural network
Answer: B
NEW QUESTION 52
You have a requirement to insert minute-resolution data from 50,000 sensors into a BigQuery table. You expect significant growth in data volume and need the data to be available within 1 minute of ingestion for real- time analysis of aggregated trends. What should you do?
- A. Use the INSERT statement to insert a batch of data every 60 seconds.
- B. Use bq loadto load a batch of sensor data every 60 seconds.
- C. Use a Cloud Dataflow pipeline to stream data into the BigQuery table.
- D. Use the MERGE statement to apply updates in batch every 60 seconds.
Answer: A
Explanation:
Explanation
NEW QUESTION 53
You have Google Cloud Dataflow streaming pipeline running with a Google Cloud Pub/Sub subscription as the source. You need to make an update to the code that will make the new Cloud Dataflow pipeline incompatible with the current version. You do not want to lose any data when making this update. What should you do?
- A. Create a new pipeline that has a new Cloud Pub/Sub subscription and cancel the old pipeline.
- B. Update the current pipeline and provide the transform mapping JSON object.
- C. Create a new pipeline that has the same Cloud Pub/Sub subscription and cancel the old pipeline.
- D. Update the current pipeline and use the drain flag.
Answer: B
Explanation:
If any transform names in your pipeline have changed, you must supply a transform mapping and pass it using the --transformNameMapping option.
https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline#preventing_compatibility_breaks
NEW QUESTION 54
Which is the preferred method to use to avoid hotspotting in time series data in Bigtable?
- A. Salting
- B. Randomization
- C. Hashing
- D. Field promotion
Answer: D
Explanation:
Explanation
By default, prefer field promotion. Field promotion avoids hotspotting in almost all cases, and it tends to make it easier to design a row key that facilitates queries.
Reference:
https://cloud.google.com/bigtable/docs/schema-design-time-series#ensure_that_your_row_key_avoids_hotspotti
NEW QUESTION 55
You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google Cloud. Your input data is in CSV format. You want to minimize the cost of querying aggregate values for multiple users who will query the data in Cloud Storage with multiple engines. Which storage service and schema design should you use?
- A. Use Cloud Storage for storage. Link as temporary tables in BigQuery for query.
- B. Use Cloud Storage for storage. Link as permanent tables in BigQuery for query.
- C. Use Cloud Bigtable for storage. Link as permanent tables in BigQuery for query.
- D. Use Cloud Bigtable for storage. Install the HBase shell on a Compute Engine instance to query the Cloud Bigtable data.
Answer: B
NEW QUESTION 56
Your neural network model is taking days to train. You want to increase the training speed. What can you do?
- A. Increase the number of layers in your neural network.
- B. Subsample your training dataset.
- C. Increase the number of input features to your model.
- D. Subsample your test dataset.
Answer: A
NEW QUESTION 57
You use a dataset in BigQuery for analysis. You want to provide third-party companies with access to the same dataset. You need to keep the costs of data sharing low and ensure that the data is current. Which solution should you choose?
- A. Create an authorized view on the BigQuery table to control data access, and provide third-party companies with access to that view.
- B. Use Cloud Scheduler to export the data on a regular basis to Cloud Storage, and provide third-party companies with access to the bucket.
- C. Create a Cloud Dataflow job that reads the data in frequent time intervals, and writes it to the relevant BigQuery dataset or Cloud Storage bucket for third-party companies to use.
- D. Create a separate dataset in BigQuery that contains the relevant data to share, and provide third-party companies with access to the new dataset.
Answer: B
NEW QUESTION 58
When you design a Google Cloud Bigtable schema it is recommended that you _________.
- A. Create schema designs that require atomicity across rows
- B. Avoid schema designs that are based on NoSQL concepts
- C. Create schema designs that are based on a relational database design
- D. Avoid schema designs that require atomicity across rows
Answer: D
Explanation:
Explanation
All operations are atomic at the row level. For example, if you update two rows in a table, it's possible that one row will be updated successfully and the other update will fail. Avoid schema designs that require atomicity across rows.
Reference: https://cloud.google.com/bigtable/docs/schema-design#row-keys
NEW QUESTION 59
You work for a shipping company that uses handheld scanners to read shipping labels. Your company has strict data privacy standards that require scanners to only transmit recipients' personally identifiable information (PII) to analytics systems, which violates user privacy rules. You want to quickly build a scalable solution using cloud-native managed services to prevent exposure of PII to the analytics systems.
What should you do?
- A. Build a Cloud Function that reads the topics and makes a call to the Cloud Data Loss Prevention API.
Use the tagging and confidence levels to either pass or quarantine the data in a bucket for review. - B. Use Stackdriver logging to analyze the data passed through the total pipeline to identify transactions that may contain sensitive information.
- C. Install a third-party data validation tool on Compute Engine virtual machines to check the incoming data for sensitive information.
- D. Create an authorized view in BigQuery to restrict access to tables with sensitive data.
Answer: D
NEW QUESTION 60
You create an important report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. You notice that visualizations are not showing data that is less than 1 hour old. What should you do?
- A. Refresh your browser tab showing the visualizations.
- B. Disable caching by editing the report settings.
- C. Clear your browser history for the past hour then reload the tab showing the virtualizations.
- D. Disable caching in BigQuery by editing table details.
Answer: B
Explanation:
https://support.google.com/datastudio/answer/7020039?hl=en
NEW QUESTION 61
Which of the following is NOT a valid use case to select HDD (hard disk drives) as the storage for Google Cloud Bigtable?
- A. You will not use the data to back a user-facing or latency-sensitive application.
- B. You expect to store at least 10 TB of data.
- C. You need to integrate with Google BigQuery.
- D. You will mostly run batch workloads with scans and writes, rather than frequently executing random reads of a small number of rows.
Answer: C
Explanation:
For example, if you plan to store extensive historical data for a large number of remote- sensing devices and then use the data to generate daily reports, the cost savings for HDD storage may justify the performance tradeoff. On the other hand, if you plan to use the data to display a real-time dashboard, it probably would not make sense to use HDD storage-reads would be much more frequent in this case, and reads are much slower with HDD storage.
Reference: https://cloud.google.com/bigtable/docs/choosing-ssd-hdd
NEW QUESTION 62
You operate a database that stores stock trades and an application that retrieves average stock price for a given company over an adjustable window of time. The data is stored in Cloud Bigtable where the datetime of the stock trade is the beginning of the row key. Your application has thousands of concurrent users, and you notice that performance is starting to degrade as more stocks are added. What should you do to improve the performance of your application?
- A. Use Cloud Dataflow to write summary of each day's stock trades to an Avro file on Cloud Storage. Update your application to read from Cloud Storage and Cloud Bigtable to compute the responses.
- B. Change the row key syntax in your Cloud Bigtable table to begin with the stock symbol.
- C. Change the row key syntax in your Cloud Bigtable table to begin with a random number per second.
- D. Change the data pipeline to use BigQuery for storing stock trades, and update your application.
Answer: B
NEW QUESTION 63
MJTelco Case Study
Company Overview
MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world.
The company has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-speed backbone links with inexpensive hardware.
Company Background
Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location availability and cost.
Their management and operations teams are situated all around the globe creating many-to-many relationship between data consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to support their needs.
Solution Concept
MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs:
* Scale and harden their PoC to support significantly more data flows generated when they ramp to more than 50,000 installations.
* Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology definition.
MJTelco will also use three separate operating environments - development/test, staging, and production - to meet the needs of running experiments, deploying new features, and serving production customers.
Business Requirements
* Scale up their production environment with minimal cost, instantiating resources when and where needed in an unpredictable, distributed telecom user community.
* Ensure security of their proprietary data to protect their leading-edge machine learning and analysis.
* Provide reliable and timely access to data for analysis from distributed research workers
* Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their customers.
Technical Requirements
Ensure secure and efficient transport and storage of telemetry data
Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each.
Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately 100m records/day Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and in production learning cycles.
CEO Statement
Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to meet our reliability and capacity commitments.
CTO Statement
Our public cloud services must operate as advertised. We need resources that scale and keep our data secure.
We also need environments in which our data scientists can carefully study and quickly adapt our models.
Because we rely on automation to process our data, we also need our development and test environments to work as we iterate.
CFO Statement
The project is too large for us to maintain the hardware and software required for the data and analysis. Also, we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google Cloud's machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with our data pipelines.
MJTelco is building a custom interface to share data. They have these requirements:
* They need to do aggregations over their petabyte-scale datasets.
* They need to scan specific time range rows with a very fast response time (milliseconds).
Which combination of Google Cloud Platform products should you recommend?
- A. BigQuery and Cloud Storage
- B. BigQuery and Cloud Bigtable
- C. Cloud Datastore and Cloud Bigtable
- D. Cloud Bigtable and Cloud SQL
Answer: B
NEW QUESTION 64
You have spent a few days loading data from comma-separated values (CSV) files into the Google BigQuery table CLICK_STREAM. The column DT stores the epoch time of click events. For convenience, you chose a simple schema where every field is treated as the STRING type. Now, you want to compute web session durations of users who visit your site, and you want to change its data type to the TIMESTAMP. You want to minimize the migration effort without making future queries computationally expensive. What should you do?
- A. Construct a query to return every row of the table CLICK_STREAM, while using the built-in function to cast strings from the column DT into TIMESTAMP values. Run the query into a destination table NEW_CLICK_STREAM, in which the column TS is the TIMESTAMP type. the table NEW_CLICK_STREAM instead of the table CLICK_STREAM from now on. In the future, new data is loaded into the table NEW_CLICK_STREAM.
- B. Delete the table CLICK_STREAM, and then re-create it such that the column DT is of the TIMESTAMP type. Reload the data.
- C. Add two columns to the table CLICK STREAM: TS of the TIMESTAMP type and IS_NEW of the BOOLEAN type. Reload all data in append mode. For each appended row, set the value of IS_NEW to true. For future queries, the column TS instead of the column DT, with the WHERE clause ensuring that the value of IS_NEW must be true.
- D. Add a column TS of the TIMESTAMP type to the table CLICK_STREAM, and populate the numeric values from the column TS for each row. the column TS instead of the column DT from now on.
- E. Create a view CLICK_STREAM_V, where strings from the column DT are cast into TIMESTAMP values. the view CLICK_STREAM_V instead of the table CLICK_STREAM from now on.
Answer: C
NEW QUESTION 65
......
Latest Professional-Data-Engineer Pass Guaranteed Exam Dumps with Accurate & Updated Questions: https://www.practicevce.com/Google/Professional-Data-Engineer-practice-exam-dumps.html
Pass Professional-Data-Engineer Exam with Updated Professional-Data-Engineer Exam Dumps PDF 2021: https://drive.google.com/open?id=1HDVOQQdbmsWMXy-O6Z7MBOXN8i3Yw9qn